7 - (Linear) Principal Component Analysis [ID:14899]

50 von 546 angezeigt

Good morning everyone and welcome to today's online lecture.

The subject today will be a very important tool in data science called the principal

component analysis.

And before I introduce this important concept, I would like to give you a short example so

that you get some intuition on what we will be working on today.

So let us assume that we have some very simple data.

So this is kind of an academic example.

And this data is just in R2.

So let me draw some axis here.

And this should be right rectangular, but never mind.

And let us say we have a couple of data points now in R2, which are distributed like this.

So we have a couple of points up here.

And then we also have a couple of points down here.

And just by looking at this data point distribution, you can already tell, hey, this data seems

to be strongly correlated.

Maybe not only correlated, but it is even proportional.

So there's a strong correlation between the first and the second dimension here.

So what we would like to do now is just to show you this data distribution is kind of

linear.

And the principal component analysis is one tool to find such strong correlation in data,

find the structure which is given in data, and try to extract the most important features.

OK.

So what can we do here?

Let's assume we are not only given some data, but we also know some class labeling like

we had in the case of clustering or for, let's say, a classification task.

And let's say part of this data belongs to class 1.

Say this part down here.

Let's change the color here.

Oops.

That's all.

What about only this part?

It doesn't like.

OK, let me just redraw these in another color.

Where is it?

There.

OK.

So before we had this is class 1.

And we also would like to have a couple of points up here, which are class 2.

And what we would like to do now is to separate this data.

And of course, as you can see in R2, there are many ways how you could do so.

So you could have kind of linear separation that might look like this or that might look

like that.

So whenever you're on one part, one side of this linear separation plane, then you could

say, OK, this is class 1 and the other one is class 2.

But PCA offers us some kind of a tool chain that enables us to even simplify the data

such that linear classification becomes even more efficient and more easy.

So how can we do this?

Let me delete some of this data.

So what PCA does first is it tries to identify the inherent structure of the data.

And as we've seen before, we said, OK, there's kind of linear correlation between the data.

Teil einer Videoserie :

Mathematical Data Science 1

Presenters

Prof. Dr. Daniel Tenbrinck

Zugänglich über

Offener Zugang

Dauer

00:39:11 Min

Aufnahmedatum

2020-05-04

Hochgeladen am

2020-05-04 21:56:30

Sprache

en-US

Einbetten

Wordpress FAU Plugin

 https://www.fau.tv/clip/id/14899

iFrame

<iframe src="https://api.video.uni-erlangen.de/services/oembed/?url=https://www.fau.tv/clip/id/14899&format=iframe&maxwidth=1280&maxheight=720" width="1280" height="720"seamless allowfullscreen style="border: 0; padding: 0; margin: 0;overflow: hidden;"></iframe>

Herunterladen

Video

Per RSS abonnieren